SentiCap: Generating Image Descriptions with Sentiments

نویسندگان

Alexander Patrick Mathews

Lexing Xie

Xuming He

چکیده

The recent progress on image recognition and language modeling is making automatic description of image content a reality. However, stylized, non-factual aspects of the written description are missing from the current systems. One such style is descriptions with emotions, which is commonplace in everyday communication, and influences decision-making and interpersonal relationships. We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments. We propose a novel switching recurrent neural network with word-level regularization, which is able to produce emotional image captions using only 2000+ training sentences containing sentiments. We evaluate the captions with different automatic and crowd-sourcing metrics. Our model compares favourably in common quality metrics for image captioning. In 84.6% of cases the generated positive captions were judged as being at least as descriptive as the factual captions. Of these positive captions 88% were confirmed by the crowd-sourced workers as having the appropriate sentiment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fusing Social Networks with Deep Learning for Volunteerism Tendency Prediction

Zobrist Hashing: An Efficient Work Distribution Method for Parallel Best-First Search Yuu Jinnai, Alex Fukunaga VIS: Text and Vision Oral Presentations 1326 SentiCap: Generating Image Descriptions with Sentiments Alexander Patrick Mathews, Lexing Xie, Xuming He 1950 Reading Scene Text in Deep Convolutional Sequences Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, Xiaoou Tang 1247 Creating Image...

متن کامل

Midge: Generating Image Descriptions From Computer Vision Detections

This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation syst...

متن کامل

From Image Annotation to Image Description

In this paper, we address the problem of automatically generating a description of an image from its annotation. Previous approaches either use computer vision techniques to first determine the labels or exploit available descriptions of the training images to either transfer or compose a new description for the test image. However, none of them report results on the effect of incorrect label d...

متن کامل

Generating Image Descriptions with Gold Standard Visual Inputs: Motivation, Evaluation and Baselines

In this paper, we present the task of generating image descriptions with gold standard visual detections as input, rather than directly from an image. This allows the Natural Language Generation community to focus on the text generation process, rather than dealing with the noise and complications arising from the visual detection process. We propose a fine-grained evaluation metric specificall...

متن کامل

Generating Natural Video Descriptions via Multimodal Processing

Generating natural language descriptions of visual content is an intriguing task which has wide applications such as assisting blind people. The recent advances in image captioning stimulate further study of this task in more depth including generating natural descriptions for videos. Most works of video description generation focus on visual information in the video. However, audio provides ri...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

SentiCap: Generating Image Descriptions with Sentiments

نویسندگان

چکیده

منابع مشابه

Fusing Social Networks with Deep Learning for Volunteerism Tendency Prediction

Midge: Generating Image Descriptions From Computer Vision Detections

From Image Annotation to Image Description

Generating Image Descriptions with Gold Standard Visual Inputs: Motivation, Evaluation and Baselines

Generating Natural Video Descriptions via Multimodal Processing

عنوان ژورنال:

اشتراک گذاری